Improving Bitext Word Alignments via Syntax-based Reordering of English

نویسندگان

Elliott Franco Drábek

David Yarowsky

چکیده

We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce syntaxbased heuristics which transform the target language (e.g. English) into a form more closely resembling the source language, and then by using standard alignment methods to align the transformed bitext. We present experimental results under variable resource conditions. The method improves word alignment performance for language pairs such as English-Korean and English-Hindi, which exhibit longer-distance syntactic divergences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuri...

متن کامل

Experiments with word alignment, normalization and clause reordering for SMT between English and German

This paper presents the LIU system for the WMT 2011 shared task for translation between German and English. For English– German we attempted to improve the translation tables with a combination of standard statistical word alignments and phrase-based word alignments. For German–English translation we tried to make the German text more similar to the English text by normalizing German morphology...

متن کامل

Word Alignment-Based Reordering of Source Chunks in PB-SMT

Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target langu...

متن کامل

Efficient Statistical Machine Translation with Constrained Reordering

This paper describes how word alignment information makes machine translation more efficient. Following a statistical approach based on finite-state transducers, we perform reordering of source sentences in training using automatic word alignments and estimate a phrase-based translation model. Using this model, we translate monotonically taking a permutation graph as input. The permutation grap...

متن کامل

Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation

Word alignments that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntaxbased machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in both alignment quality and translation quality in Chinese-English and Arabic-English translation ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Improving Bitext Word Alignments via Syntax-based Reordering of English

نویسندگان

چکیده

منابع مشابه

Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

Experiments with word alignment, normalization and clause reordering for SMT between English and German

Word Alignment-Based Reordering of Source Chunks in PB-SMT

Efficient Statistical Machine Translation with Constrained Reordering

Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation

عنوان ژورنال:

اشتراک گذاری